Intelligent augmentation of media content

ABSTRACT

Embodiments of the present disclosure include methods (and corresponding systems and computer program products) that augment visual elements in documents with rich media content and provide the rich media content based on user interaction with the augmented visual elements in the documents. The disclosed embodiments analyze a document for qualified visual elements. The disclosed embodiments determine keywords associated with the visual element, generate an association of the visual element and the keywords, and embed the association in a corresponding augmented document. When a user reviews the augmented document in a client system and moves a pointer over the augmented visual element, a piece of rich media content related to the keywords are transmitted to the client system to be displayed as an overlay in close proximity to the visual element where the mouse-over occurred.

CROSS-REFERENCE TO RELATED APPLICATIONS

This present application claims priority to and is a continuation of aU.S. Non-Provisional application Ser. No. 12/033,539, entitled“Intelligent Augmentation of Media Content”, filed on Feb. 19, 2008,which claims the benefit of and priority to U.S. Provisional PatentApplication Ser. No. 60/986,965, entitled “Intelligent Augmentation ofMedia Content”, filed on Nov. 9, 2007, both of which are herebyincorporated by reference in their entirety.

BACKGROUND

1. Field of Disclosure

The disclosure generally relates to the field of data augmentation, inparticular to augmenting non-textual content in documents.

2. Description of the Related Art

As the cost associated with network storage reduces and the number ofusers having high-speed network access grows, more and more contentproviders place rich visual content (e.g., still images, videos) intheir web pages. In addition, content providers often placeadvertisements in their web pages to generate advertising revenue.However, advertisements usually occupy prominent spaces in web pages. Inaddition, advertisements tend to interrupt content arrangement in theweb pages and distract viewers.

Thus, the art lacks a system and method for augmenting visual content inweb pages and providing augmented data based on the visual content.

SUMMARY

Embodiments of the present disclosure include methods (and correspondingsystems and computer program products) that augment visual elements indocuments with rich media content and provide the rich media contentbased on user interaction with the augmented visual elements in thedocuments.

The disclosed embodiments analyze a document for qualified visualelements. The disclosed embodiments determine keywords associated withthe visual element, generate an association of the visual element andthe keywords, and embed the association in a corresponding augmenteddocument. When a user reviews the augmented document in a client systemand moves a pointer over the augmented visual element, a piece of richmedia content related to the keywords is transmitted to the clientsystem. The related rich media content can be displayed as an overlay inclose proximity to the visual element where the mouse-over occurred.

Advantages of the disclosed embodiments include providing contentproviders with additional channels for delivering relevantadvertisements and other augmented data to viewers. The disclosedembodiments also enhance viewers' web browsing experience by providingrich media overlays without the need to leave their current web page. Inaddition, the disclosed embodiments provide additional rich mediacontent on a web page without requiring a media player at a fixedlocation on the web page.

The features and advantages described in the specification are not allinclusive and, in particular, many additional features and advantageswill be apparent to one of ordinary skill in the art in view of thedrawings, specification, and claims. Moreover, it should be noted thatthe language used in the specification has been principally selected forreadability and instructional purposes, and may not have been selectedto delineate or circumscribe the disclosed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosed embodiments have other advantages and features which willbe more readily apparent from the detailed description, the appendedclaims, and the accompanying drawings. A brief description of thedrawings is below:

FIG. 1 illustrates one embodiment of a computing environment foraugmenting visual elements in documents with rich media content anddelivering the rich media content based on user interaction with theaugmented visual elements in the documents.

FIG. 2 illustrates one embodiment of an architecture of an augmentationserver as shown in, for example, FIG. 1.

FIGS. 3-5 are flowcharts illustrating one embodiment of a method foraugmenting images in web pages with advertisements and delivering theadvertisements based on user interactions with the augmented images.

FIGS. 6( a) through 6(j) are screenshots illustrating a web page, itscorresponding augmented web page, and a viewer's user experienceinteracting with the augmented web page according to one embodiment ofthe present disclosure.

DETAILED DESCRIPTION

The disclosed embodiments describe examples of a method (andcorresponding system and computer program product) for augmenting visualelements in documents with rich media content and delivering the richmedia content based on user interaction with the augmented visualelements in the documents. The visual elements include images, videos,and other graphical display (e.g., animations such as Flash). Thedocuments include any form of documents such as web pages. The richmedia content includes services (e.g., convenient Internet searchservice), hyperlinks, graphical displays, video playbacks, andadvertisements. For purposes of illustration, the method (andcorresponding system and computer program product) is described in termsof augmenting images in web pages with advertisements and delivering theadvertisements based on user interactions with the augmented images,even though the disclosed embodiments applies to all other types ofvisual elements, documents, and rich media content as defined above.

The figures and the following description relate to preferredembodiments by way of illustration only. Reference will now be made indetail to several embodiments, examples of which are illustrated in theaccompanying figures. It is noted that wherever practicable similar orlike reference numbers may be used in the figures and may indicatesimilar or like functionality. The figures depict embodiments of thedisclosed system (or method) for purposes of illustration only. Itshould be noted that from the following discussion, alternateembodiments of the structures and methods disclosed herein will bereadily recognized by one skilled in the art as viable alternatives thatmay be employed without departing from the principles described herein.

Computing Environment

FIG. 1 illustrates one embodiment of a computing environment 100 foraugmenting images in web pages with advertisements and delivering theadvertisements based on user interaction with the augmented images. Asillustrated, the computing environment 100 includes an augmentationserver 110, multiple content providers (or websites) 120, and one ormore client computers (or user systems) 130, all of which arecommunicatively coupled through a network 140.

The augmentation server 110 is configured to augment images (and othertypes of visual elements) in web pages (and other types of documents)with advertisements (and other types of rich media content), and deliverthe advertisements based on user interaction with the augmented images.The augmentation server 110 retrieves web pages from the contentproviders 120 and augments the web pages. The augmentation server 110augments a web page by analyzing it for qualified images and their oneor more related contexts and subjects, associating (or tagging) theimages with one or more related contexts and subjects, and storing theassociations in a database. When a user views an augmented web page in aclient computer 130 and moves a pointer over one of the tagged (oraugmented) images (hereinafter “the active image”), the augmentationserver 110 provides an advertisement related to the context and/orsubject of the active image for display in the client computer 130 as anoverlay of the active image. An example architecture of the augmentationserver 110 is described in detail below with respect to FIG. 2.

The content providers 120 are entities that provide (or generate), host,publish, control, or otherwise have rights over a collection of webpages (or other types of documents). In one embodiment, the contentproviders 120 are web servers hosting web pages for viewers to access.The content providers 120 may provide web pages to the augmentationserver 110 for augmentation. Alternatively, the content providers 120may either instruct or give permission to the augmentation server 110 toretrieve all or parts of their web pages for augmentation.

The client computers 130 are client devices for users to browse webpages (or other types of documents). In one embodiment, a clientcomputer 130 includes a pointer device (e.g., a mouse, a trackball, aroller, a touchpad, or the like), a conventional web browser (e.g.,Microsoft Internet Explorer™, Mozilla Firefox™, or Apple Safari™), andcan retrieve and display web pages from the content providers 120 in aconventional manner (e.g., using the HyperText Transfer Protocol). Inone embodiment, the client computer 130 displays an augmented web pagein a manner identical (or substantially similar) to the correspondingoriginal web page. When a user moves a pointer (e.g., mouse pointer)over an augmented image in the augmented web page, the client computer130 (or the utilized web browser) generates a request and transmits therequest to the augmentation server 110 for a relevant advertisement. Theclient computer 130 (or the utilized web browser) displays the retrievedadvertisement as an overlay proximate to the active image.

The network 140 is configured to communicatively connect theaugmentation server 110, the content providers 120, and the clientcomputers 130. The network 140 may be a wired or wireless network.Examples of the network 140 include the Internet, an intranet, a WiFinetwork, a WiMAX network, a mobile telephone network, or a combinationthereof.

In one embodiment, the augmentation server 110, the content providers120, and/or the client computers 130 are structured to include aprocessor, memory, storage, network interfaces, and applicable operatingsystem and other functional software (e.g., network drivers,communication protocols).

Example Augmentation Server Architectural Overview

Referring next to FIG. 2, a block diagram illustrating an examplearchitecture of the augmentation server 110 shown in FIG. 1. Asillustrated, the augmentation server 110 includes an input/output module210, an image augmentation module 220, an advertisement delivery module230, and an augmentation and tracking database 240. The modules 210through 240 may include a software or firmware instruction that can bestored within a tangible computer readable medium (e.g., magnetic diskdrive, flash memory, or random-access memory) and executed by aprocessor or equivalent electrical circuits, microcode, or the like.

The input/output module 210 is configured to communicate with externaldevices (or entities) such as the content providers 120 and the clientcomputers 130 through communication channels such as the network 140. Inone embodiment, the input/output module 210 receives web pages (or othertypes of documents) from the content providers 120, or retrieves (orcrawls) websites for web pages. The input/output module 210 transmits tothe content providers 120 augmented web pages or information enablingthe content providers 120 to augment the web pages. The input/outputmodule 210 also receives requests (or signals) from client computers 130indicating user interactions with the augmented web pages, and transmitsto the client computers 130 related advertisements for display.

The image augmentation module 220 is configured to augment images (andother types of visual elements) in web pages (and other types ofdocuments) with advertisements (and other types of rich media content).As illustrated, the image augmentation module 220 includes an imagedetection sub-module 222 and a context discovery sub-module (orknowledge engineering module) 224. The image detection sub-module 222 isconfigured to detect qualified images in the web pages. One embodimentof a detailed operation for the image detection sub-module 222 to detectqualified images is described in detail below with respect to FIG. 3.

The context discovery sub-module 224 is configured to identify contextsand/or subject keywords related to the qualified images in the webpages. A context of a web page (or an image) is a circumstance relevantto the content of the web page (or the subject of the image). A subjectof an image is a subject matter of the content of the image. An imagemay have one or more contexts and/or subjects. A context and/or asubject can be defined (or described) using one or more keywords. Asused herein, both context keywords and subject keywords are collectivelycalled keywords for ease of discussion. The context discovery sub-module224 may process information associated with the image to identifyrelated keywords. Alternatively (or in addition), the context discoverysub-module 224 may receive context information from the contentproviders 120. One embodiment of a detailed operation for the contextdiscovery sub-module 224 to identify keywords is described in detailbelow with respect to FIG. 4.

The image augmentation module 220 is configured to generate intelligenttags for the qualified images, and augment the qualified images with theintelligent tags. In one embodiment, an intelligent tag uniquelyidentifies its associated image (and optionally the associated webpage). An intelligent tag may also include some or all of the identifiedkeywords identified for the associated image. The image augmentationmodule 220 may integrate the intelligent tags into the web pages orprovide them to the content providers 120 for integration. Web pageswith the integrated intelligent tags are called augmented web pages.Images with the integrated intelligent tags are called augmented images.The image augmentation module 220 also stores the identified keywordstogether with identifiers of the associated images in the augmentationand tracking database 240 for later references.

The advertisement delivery module 230 is configured to provide relatedadvertisements (or other types of rich media content) based on userinteraction with augmented images in augmented web pages. Theadvertisement delivery module 230 receives an intelligent tag requestindicating a user interaction with an augmented image (e.g., moving amouse pointer over the augmented image) from a client computer 130through the input/output module 210. The advertisement delivery module230 retrieves keywords associated with the active image, and determinesone or more relevant advertisements matching the keywords in anadvertising database (not shown). The advertisement delivery module 230provides the relevant advertisements to the requesting client computer130 for display. Alternatively, the advertisement delivery module 230transmits addresses (e.g., Universal Resource Locator (URL)) of therelevant advertisements to the requesting client computer 130 forretrieval.

The augmentation and tracking database 240 (hereinafter “the database240”) is configured to store identifiers of the augmented images (e.g.,URL) and the associated keywords. In one embodiment, the database 240also serves as the advertising database and hosts a table of advisementidentifiers, their addresses, format information, and associatedkeywords.

The components of the augmentation server 110 can reside on a singlecomputer system or several computer systems located close by or remotelyfrom each other. For example, the image augmentation module 220 and theadvertisement delivery module 230 may reside on separate web servers,and the database 240 may be located in a dedicated database server. Inaddition, any of the components or sub-components may be executed in oneor multiple computer systems.

Overview of Methodology

Referring next to FIGS. 3 through 5, flowcharts collectively illustratean example method (or process) for augmenting images in web pages withadvertisements and delivering the advertisements based on userinteractions with the augmented images.

In one embodiment, the illustrated method (or any of the sub-methods300, 400, and 500) is implemented in a computing environment such as thecomputing environment 100. One or more portions of the method may beimplemented in embodiments of hardware and/or software or combinationsthereof. By way of example, the illustrated method may be embodiedthrough instructions for performing the actions described herein andsuch instrumentations can be stored within a tangible computer readablemedium and are executable by a processor. Alternatively (oradditionally), the illustrated method may be implemented in modules likethose in the augmentation server 110 described above with respect toFIGS. 1 and 2 and/or other entities such as the content providers 120and/or the client computers 130. Furthermore, those of skill in the artwill recognize that other embodiments can perform the steps of theillustrated method in different order. Moreover, other embodiments caninclude different and/or additional steps than the ones described here.

As illustrated in FIG. 3, initially, the augmentation server 110retrieves 310 web pages from the content providers 120 (e.g., throughthe input/output module 210). The content providers 120 may provide webpages for augmentation to the augmentation server 110 (e.g., bytransmitting the web pages to the augmentation server 110).Alternatively, the augmentation server 110 may crawl websites toretrieve 310 web pages. For example, the content providers 120 mayeither instruct or give permission to the augmentation server 110 toretrieve all or parts of their web pages for augmentation. Accordingly,the augmentation server 110 retrieves 310 the web pages as identified bythe content providers 120.

The augmentation server 110 analyzes 320 the retrieved web pages forqualified images within the web pages (e.g., through the image detectionsub-module 222). In one embodiment, the augmentation server 110enumerates through all images in the web pages for those that meetpredefined criteria. The predefined criteria for qualified imagesinclude limitations such as a minimum width and a minimum height. Small(or flat or narrow) images tend not to be images of viewer's interest(e.g., border line patterns). Only images equal to or exceed the minimumwidth and height are potentially qualified images. The predefinedcriteria may also disqualify certain types of images (e.g., advertisingimages, background images, clickable images, or images that areotherwise associated with certain tags). The predefined criteria arecustomizable and the augmentation server 110 may apply different sets ofpredefined criteria for different content providers 120 based oncustomer requests and/or needs.

The augmentation server 110 contextualizes 330 the qualified images inthe web pages (e.g., through the context discovery sub-module 224). Inone embodiment, the augmentation server 110 determines (or identifies)context keywords and/or subject keywords for each qualified image in theweb pages. The contextualization step 330 is further described withrespect to FIG. 4.

Referring now to FIG. 4, a flowchart illustrating an example method 400for the augmentation server 110 to contextualize 330 a qualified imagein a web page. As illustrated, the augmentation server 110 determines410 whether contexts for the web page (or the qualified image) areavailable. As defined above with respect to the context discoverysub-module 224 in FIG. 2, a context of a web page (or image) is acircumstance relevant to the content of the web page (or the subject ofthe image). A web page (or image) can have zero, one, or more contexts.A context can be defined or described using one or more keywords. Forexample, a web page about the life of the world-renowned author ErnestHemingway may have a context of “literature.” The augmentation server110 may determine 410 whether contexts of the web page (or the qualifiedimage) are available from its content provider 120. If the augmentationserver 110 determines 410 that the contexts are available, it retrievesthe available contexts and proceeds to determine 430 whether subjectsfor the qualified image are available, as described in detail below.

If the augmentation server 110 determines 410 that the contexts are notavailable, it identifies 420 contexts for the web page (or the qualifiedimage) by analyzing content of the web page. In one embodiment, theaugmentation server 110 uses natural language processing technologies toprocess textual content of the web page to determine contexts of the webpage (or the qualified image). For example, if the web page containscontent about works and life of Ernest Hemingway, the augmentationserver 110 identifies 420 a literature context for the web page and/orthe enclosed qualified image. Alternatively (or in addition), theaugmentation server 110 can analyze related web pages (e.g., web pageslinking to/from the web page, or web pages from the same web site) toidentify 420 a particular context or contexts.

The augmentation server 110 determines 430 whether subjects of thequalified image are available. As defined above with respect to thecontext discovery sub-module 224 in FIG. 2, an image may have one ormore subjects described by one or more keywords. For example, aphotograph of Ernest Hemingway may have a subject keyword of “ErnestHemingway,” “author,” or both. The augmentation server 110 may determine430 whether subjects of the qualified image are available from itscontent provider 120. If the augmentation server 110 determines 430 thatthe subjects are available, it retrieves the available subjects andproceeds to validate 490 the contexts and/or the subjects, as describedin detail below.

If the augmentation server 110 determines 430 that the subjects are notavailable, it analyzes 440 tags associated with the qualified image. Inone embodiment, the augmentation server 110 extracts ALT tags associatedwith the qualified image for the subjects. An ALT tag is a HTML tag thatprovides alternative text for non-textual elements, typically images, inan HTML document (e.g., a web page). ALT tags are often used to providedescriptive information of the associated images. If the qualified imagehas associated ALT tags available, the augmentation server 110 analyzestheir contents to extract relevant subject keywords.

The augmentation server 110 determines 450 whether the subjects for thequalified image are available from the associated ALT tags. If theaugmentation server 110 extracts subjects by analyzing 440 ALT tagsassociated with the qualified image, it proceeds to validate 490 thecontexts and/or the subjects, as described in detail below.

If the qualified image has no associated ALT tags, or the augmentationserver 110 fails to extract any subject keywords from the ALT tags(e.g., the associated ALT tags contain no relevant keywords), theaugmentation server 110 analyzes 460 contents of text nodes physicallyclose to the qualified image on the web page for relevant subjects. Forexample, the augmentation server 110 may analyze the associated caption,descriptive text, hyperlink references (or anchor texts). For example,in a web page about Ernest Hemingway, a photograph of the famous writerhas a caption of “Photograph of Ernest Hemingway aboard his yacht” andis the destination of a hyperlink with anchor text “Ernest Hemingway inhis 50's.” The augmentation server 110 may use the caption and/or theanchor text to determine subjects for the photograph. In one embodiment,the augmentation server 110 may be configured to restrict the number ofparagraphs to be analyzed for subjects in a web page to preventidentifying too many subjects. In one embodiment, the augmentationserver 110 uses hyperlinks (e.g., other hyperlinks to the image fromother pages) to determine subjects for the qualified image.

The augmentation server 110 determines 470 whether the subjects of thequalified image are available from the nearby text nodes. If theaugmentation server 110 identifies subjects by analyzing 460 nearby textnodes, it proceeds to validate 490 the contexts and/or the subjects, asdescribed in detail below.

If the augmentation server 110 fails to identify subjects of thequalified image by analyzing 460 nearby text nodes, the augmentationserver 110 analyzes 480 areas of interest in the web page for thesubjects. An area of interest is a field (e.g., a markup field such as aheading or a title) of the web page containing information relevant tothe qualified image. The area of interest may be provided by the contentprovider 120 or identified by the augmentation server 110 (e.g., byanalyzing web pages from the same web site). The augmentation server 110analyzes 480 the areas of interest in the web page to identify subjectsfor the qualified image.

The augmentation server 110 validates 490 the identified (or determined)contexts and/or the subjects. In one embodiment, the augmentation server110 verifies the integrity (and quality) of the contexts and thesubjects. For example, the augmentation server 110 may determine whetherthe subjects (e.g., Ernest Hemingway) and the contexts (e.g.,literature) are relevant. If they are not relevant, the augmentationserver 110 may determine that the subjects and/or the contexts areconsidered not valid and may repeat the process 400. In one embodiment,the augmentation server 110 may determine a relevance of the contextkeywords and/or the subject keywords with the qualified image in view ofthe analyzed text, and sort them by relevance.

It is noted that in one embodiment, the contexts and/or the subjects aredetermined even if they are provided by the content provider. In oneembodiment, two or more steps of determining subjects 440, 460, 480 areexecuted even if subjects are determined 430 available or identified byone or more of the steps 440, 460, 480. In one embodiment, if contextsand/or subjects are determined to be not valid, the augmentation server110 proceeds with one or more of the following steps: ignoring (orremoving) the not valid keywords; repeating the process 400; ordisqualifying the web page and/or the qualified image.

Referring back to FIG. 3, the augmentation server 110 augments 340 theweb pages with intelligent tags (e.g., through the image augmentationmodule 220). The augmentation server 110 generates intelligent tags forthe qualified images based on their identified keywords, and tags thequalified images with the generated intelligent tags. The intelligenttags contain information about the associated qualified images, such assubject keywords, context keywords, and image identifiers. Theintelligent tags may contain multiple sections (hereinafter calledIntelliImage sections) for additional information. For example, theintelligent tags may contain requirement (or preference) informationabout rich media content to be associated with the image, such as typesof advertisements or rich media content, and whether the qualified imageis clickable.

In one embodiment, the augmentation server 110 integrates theintelligent tags into the web pages as image tags associated with thequalified images. For example, for an image of an Apple iPhone™, theaugmentation server 110 may place an intelligent tag with subjectkeywords “Apple iPhone” in an ALT tag in the accompanying IMG tag, asillustrated below.

<IMG SRC=“phone.jpg” ALT=“Apple iphone”>

Alternatively, the augmentation server 110 may add the intelligent tagin a different section or tag associated with the qualified image. Inone embodiment, the augmentation server 110 transmits the intelligenttags (or their contents) to the content provider 120 for integration. Inone embodiment, the intelligent tags do not affect the display of theaugmented web pages.

In one embodiment, the augmentation server 110 (or the content providers120) also augments the web pages by including computer code (hereinaftercalled client code) to monitor and report viewers' interactions with theaugmented images. The computer code can be in any computer language,such as JavaScript. Additional functions of the client code aredescribed in detail below with respect to FIG. 5.

The augmentation server 110 stores information about the augmentation inthe database 240 for future reference. The stored information includesinformation about the augmented web pages (e.g., their URL and includedaugmented images), the augmented images (e.g., their size, type,resolution, image identifier, whether clickable, or identified contextand subject keywords), and the related advertisements (e.g., types ofthe advertisements).

The augmentation server delivers 350 relevant advertisements respondingto user interaction with augmented images. The delivery step 350 isdescribed in detail below with respect to FIG. 5.

Referring now to FIG. 5, a flowchart illustrating an example method 500for the augmentation server 110 to deliver 350 a relevant advertisementresponding to a user interaction with an augmented image in an augmentedweb page. As illustrated, a content provider 120 (or a related entity,such as a web hosting provider) transmits 510 an augmented web page to aclient computer 130. For example, a user of the client computer 130 mayenter the URL of an augmented web page (or the corresponding originalweb page) in the address bar of a conventional web browser (e.g.,Microsoft Internet Explorer™, Mozilla Firefox™, or Apple Safari™). Theweb browser of the client computer 130 (hereinafter called the clientweb browser) consequently transmits a request for the web page to acorresponding content provider 120. Responding to the request, thecontent provider 120 transmits 510 the augmented web page to the clientweb browser for display. In one embodiment, the client web browserdisplays augmented images in the web pages in a manner identical to (orclosely resembles) corresponding original images. In other embodiment,the augmented image may be highlighted in the augmented web page asdisplayed in the client web browser.

The augmentation server 110 receives 520 an intelligent tag request fromthe client computer 130. As described above with respect to FIG. 3, theaugmented web page contains client code that monitors user interactionswith augmented images. In one embodiment, if the user moves a pointer(e.g., mouse pointer or touchpad) over an augmented image (the activeimage), the client code (or the web browser) generates an intelligenttag request and transmits the request to the augmentation server 110.The request indicates the mouse-over user activity to the augmentationserver 110. The request may contain information that uniquely identifiesthe active image (e.g., an image identifier such as an URL), and/orother information such as associated advertisement types.

The augmentation server 110 determines 530 an advertisement relevant tothe active image for the received request based on keywords associatedwith the active image. The augmentation server 110 extracts the imageidentifier from the request, and retrieves corresponding keywords and/orother information (e.g., image size or advertisement type) for theactive image from the database 240. The augmentation server 110determines 530 an advertisement related to the retrieved keywords bysearching for the advertisement in an advertisement database usingretrieved keywords. In one embodiment, the augmentation server 110identifies the advertisement that matches the best (e.g., matching themost number of keywords or matching the most important keywords) as therelevant advertisement. In one embodiment, the augmentation server 110only searches for advertisements that match the advertisement typeand/or other requirements associated with the active image (e.g., imagesize).

In one embodiment, the augmentation server 110 generates computer code(hereinafter called the advertisement code) to facilitate userinteraction with the advertisement. Similar to the client code, theadvertisement code can be in any computer language, such as JavaScript.For example, the advertisement code may overlay the active image with anadvertisement banner, and displays the advertisement in a popup box whenthe user moves a pointer over the banner. The advertisement code mayalso make the advertisement clickable through to the advertiser'sdestination page. In addition, if the advertisement contains video, theadvertisement code may overlay the active image with a video player withvideo controls (e.g., forward, rewind, play/pause, volume, etc.).

The augmentation server 110 transmits 540 the relevant advertisement tothe client computer 130 for display. In one embodiment, the augmentationserver 110 retrieves the advertisement from a database hosting theadvertisement (e.g., the advertising database), and transmits 540 it tothe client web browser (or the client computer 130) for display.Alternatively, the augmentation server 110 may transmits a reference ofthe advertisement (e.g., its URL) to the client web browser forretrieval. The augmentation server 110 also transmits the generatedadvertisement code to the client web browser.

In one embodiment, the client web browser displays 550 the relevantadvertisement proximate to the active image as an in-image overlay. Asnoted above, the augmentation server 110 may augment images (or othervisual elements) with rich media content. Examples of the rich mediacontent include search mechanisms, videos, and interactiveadvertisements. In one embodiment, the client web browser overlays abanner on the active image with information about the accompanying richmedia content. The banner may include a hook icon that relates to thetype of rich media content being shown. For example, the hook icon forsearch mechanisms is a magnifying glass, the hook icon for videodisplays is a video control such as a play button, and the hook icon forshopping advertisements is a shopping cart icon, and the hook icon forall other types of rich media content can be an arrow icon. When a usermoves a mouse over the hook icon, a popup box will be displayedproximate to the position where the mouse-over is occurring. The richmedia content (e.g., the relevant advertisement) is displayed in thepopup box. The popup box may integrate controls to the users. Forexample, if rich media such as video is served, the JavaScript code mayintegrate in rich media controls (e.g., forward, rewind, play/pause,volume, etc.). It is noted that in alternate embodiments the rich mediacan be immediately played in response to mouse-over within closeproximity to the banner and/or the popup box. The overlaid banner andpopup box may be generated or otherwise controlled by the advertisementcode or the client code.

In one embodiment, the displayed rich media content is a searchmechanism with a search box for users to enter query terms. For user'sconvenience, the search mechanism may pre-enter relevant search terms inthe search box. The pre-entered search terms may include keywordsidentified for the active image, keywords extracted from tags of theactive image, or areas of interest. In one embodiment, the searchmechanism may provide a set of rules (hereinafter called displaymodifiers) for extracting search terms from tags and areas of interest.For example, where the image (e.g., ALT) tag of the active imagecontains a subject of the image plus “review,” the display modifier mayremove the term “review” before placing content of the image tag in thesearch box.

The augmentation server 110 tracks 560 the received requests and/or theadvertisement displays. These activities may be logged in the database240 or reported to another device or person (e.g., via email).

Example Process and Screen Shots

The methods described above with respect to FIGS. 3-5 are illustratedbelow in an example together with accompanying FIGS. 6( a) through 6(j).

Initially, the augmentation server 110 retrieves 310 a web page 600 foraugmenting embedded media content. The web page 600 may contain mediacontent of any subject, such as a photo of a new car (e.g., BMW 745i),an image of a landscape oil painting. As shown in FIG. 6( a), the webpage 600 is retrieved from website people.com and contains informationabout popular culture singer Vanessa Hudgens.

The augmentation server 110 analyzes 320 the web page 600 for qualifiedimages. As shown in FIG. 6( a), there are several images in the web page600, such as images 610 through 650. The augmentation server 110determines that images 630 through 650 do not have the required minimumwidth and/or height and therefore are not qualified images. Theaugmentation server 110 further determines that image 620 is anadvertisement and therefore not qualified. The augmentation server 110identifies the image 610, a photo of Vanessa Hudgens, as a qualifiedimage.

The augmentation server 110 contextualizes the qualified image 610. Theaugmentation server 110 examines tags and text associated with the image610 and determines that the subject of the image 610 is Vanessa Hudgens.For example, the augmentation server 110 may analyze the descriptivecontent titled “Fan Club” that is to the right of the image 610 todetermine that the image is about Vanessa Hudgens.

The augmentation server 110 augments the web page 600 by generating anintelligent tag that uniquely identifies the image 610, and integratingthe intelligent tag in the web page 600. For purposes of clarity, theaugmented web page is referred to as web page 600′ (as illustrated inFIGS. 6( b) through 6(j)). The augmentation server 110 also includes inthe augmented web page 600′ JavaScript code that captures userinteraction with the image 610.

After the web page 600 is augmented, a web browser running on a clientcomputer 130 can now retrieve the augmented web page 600′ and display itto a user (e.g., responding to the user entering “www.people.com” in theaddress bar of the web browser). FIG. 6( b) illustrates a screenshot ofthe augmented web page 600′ as displayed on an Internet Explorer™ webbrowser after it is retrieved by the browser. It is noted that theaugmented web page 600′ is displayed in the same manner as the originalweb page 600 would be displayed in the web browser.

Subsequently, a user may move a pointer (e.g., controlled by a mouse,stylus, or touchpad) over the image 610. This user action is alsoreferred to as a mouse-over. Detecting the mouse-over, the embeddedJavaScript code in the augmented web page 700′ (or the web browser)generates an intelligent tag request uniquely identifying the image 610,and transmits the request to the augmentation server 110. Theaugmentation server 110 receives 520 the request and retrieves storedcontext and subject information (e.g., “Vanessa Hudgens”) of the image610 and searches in an advertising database for related advertisementsor rich media content. The augmentation server 110 transmits 540 therelated advertisement(s) and/or rich media content(s) back to the webbrowser for display.

The web browser displays 550 the received advertisement(s) (or richmedia content) as an overlay in proximity to the image 610. Asillustrated in FIG. 6( c), the user has moved a mouse pointer to theright lower corner of the image 610. The web browser receives aninteractive banner that reads “Related information to Vanessa Hudgens”and overlays it on the bottom of the image 610 proximate to the pointer.Also displayed on the banner is a tool tip icon (the arrow icon) throughwhich the user can launch a toolbox (or popup box) for additional richmedia content.

As illustrated in FIG. 6( d), when the user moves the mouse pointer over(or clicks) the tool tip icon, the web browser overlays a toolbox abovethe banner displaying an interactive advertisement for a movie stared byVanessa Hudgens—“High School Musical 2.” As indicated in theadvertisement, the user may interact with the advertisement by amouse-over the image of the High School Musical 2.

As described above, the augmentation server 110 may transmit 540 othertypes of rich media content to be overlaid on the image 610. Forexample, the rich media content may be retail or shopping related richcontent. As illustrated in FIG. 6( e), the augmentation server 110 maytransmits 540 a banner with shopping related information for the webbrowser to overlay on the image 610. The banner reads “Shop forEntertainment Center” and has a tool tip icon in the shape of a shoppingcart. A subsequent mouse-over to the shopping cart (or the banner)generates an overlay for a retail site where an entertainment center maybe purchased, as illustrated in FIG. 6( f).

Similarly, the augmentation server 110 may augment the image 610 withvideo (or animation). As illustrated in FIG. 6( g), the augmentationserver 110 may transmit 540 a banner that reads “Find more videosrelated to Jason Bourne” for overlay. Thereafter, on a mouse-over useraction, the web browser may overlay a video player playing a JasonBourne's video, as illustrated in FIG. 6( h). As shown, the overlaidvideo player includes video controls such as forward, rewind,play/pause, and/or volume for the user to exercise control over thevideo being played.

The augmentation server 110 may also integrate functional resources suchas web searches in still images, as illustrated in FIGS. 6( i) and 6(j).In FIG. 6( i), the augmentation server 110 transmits 540 a toolbar thatreads “Search the web for info related to David Beckham” for overlay.Upon a mouse-over, the web browser launches a search tool overlaying theimage 610, as illustrated in FIG. 6( j). It is noted that the toolbarand the search tool includes the phrase “David Beckham.” In oneembodiment, the included phrase is related to the context and/or subjectof the image 610.

Alternate Embodiments

In one embodiment, the augmentation server 110 delivers rich mediacontent without augmenting web pages ahead of time. The augmentationserver 110 (or other entities) may install a plug-in module in theclient web browser. The plug-in module monitors the client's browsingactivity and transmits related information (e.g., the URL of thedisplayed web page) to the augmentation server 110 in real time (oron-the-fly). The augmentation server 110 may retrieve the web page andanalyze it for qualified images and related keywords. The augmentationserver 110 may transmit the information (e.g., identity of the qualifiedimages or the related keywords) to the client web browser. If the usermouse-over one of the qualified images (the active image), the plug-inmodule sends a signal to the augmentation server identifying the activeimage. The augmentation server 110 may determine and transmit a relevantadvertisement to the client web browser for display in a manner similarto the one described above with respect to FIG. 5.

In one embodiment, the web pages can be augmented on the client side(e.g., via plug-in modules in the client web browser) in a mannersimilar to the one described above with respect to FIGS. 3 and 4.

One of ordinary skill in the art will readily recognize that thedescribed system and method are not limited to augmenting still imagesin web pages with advertisements and can be applied to augment any typesof visual elements in any types of documents with any types of richmedia content. Examples and detailed descriptions of an approach toaugment keywords on web pages with relevant additional information areprovided in U.S. Pat. No. 7,257,585, the disclosure of which isincorporated by reference in its entirety.

Advantages of the disclosed embodiments include providing contentproviders with additional channels for delivering relevantadvertisements and other augmented data to viewers. Because theaugmented data is displayed as an overlay of the current web page, thedisclosed embodiments in essence establish a third dimension to presentinformation (the first and second dimensions being the length and widthof the web page), enabling viewers to access additional rich mediacontent without leaving the current web page. This feature is especiallyuseful for devices with limited screen space, such as mobile computingdevices (e.g., handheld computers). The disclosed embodiments alsoenhance viewers' web browsing experience by providing rich mediaoverlays without the need to leave their current web page. In addition,the disclosed embodiments provide additional rich media content on a webpage without requiring a media player at a fixed location on the webpage.

Some portions of above description describe the embodiments in terms ofalgorithmic processes or operations. These algorithmic descriptions andrepresentations are commonly used by those skilled in the dataprocessing arts to convey the substance of their work effectively toothers skilled in the art. These operations, while describedfunctionally, computationally, or logically, are understood to beimplemented by computer programs comprising instructions for executionby a processor or equivalent electrical circuits, microcode, or thelike. Furthermore, it has also proven convenient at times, to refer tothese arrangements of functional operations as modules, without loss ofgenerality. The described operations and their associated modules may beembodied in software, firmware, hardware, or any combinations thereof.

As used herein any reference to “one embodiment” or “an embodiment”means that a particular element, feature, structure, or characteristicdescribed in connection with the embodiment is included in at least oneembodiment. The appearances of the phrase “in one embodiment” in variousplaces in the specification are not necessarily all referring to thesame embodiment.

Some embodiments may be described using the expression “coupled” and“connected” along with their derivatives. It should be understood thatthese terms are not intended as synonyms for each other. For example,some embodiments may be described using the term “connected” to indicatethat two or more elements are in direct physical or electrical contactwith each other. In another example, some embodiments may be describedusing the term “coupled” to indicate that two or more elements are indirect physical or electrical contact. The term “coupled,” however, mayalso mean that two or more elements are not in direct contact with eachother, but yet still co-operate or interact with each other. Theembodiments are not limited in this context.

As used herein, the terms “comprises,” “comprising,” “includes,”“including,” “has,” “having” or any other variation thereof, areintended to cover a non-exclusive inclusion. For example, a process,method, article, or apparatus that comprises a list of elements is notnecessarily limited to only those elements but may include otherelements not expressly listed or inherent to such process, method,article, or apparatus. Further, unless expressly stated to the contrary,“or” refers to an inclusive or and not to an exclusive or. For example,a condition A or B is satisfied by any one of the following: A is true(or present) and B is false (or not present), A is false (or notpresent) and B is true (or present), and both A and B are true (orpresent).

In addition, use of the “a” or “an” are employed to describe elementsand components of the embodiments herein. This is done merely forconvenience and to give a general sense of the disclosure. Thisdescription should be read to include one or at least one and thesingular also includes the plural unless it is obvious that it is meantotherwise.

Upon reading this disclosure, those of skill in the art will appreciatestill additional alternative structural and functional designs for asystem and a process for augmenting visual elements in documents withrich media content. Thus, while particular embodiments and applicationshave been illustrated and described, it is to be understood that thepresent invention is not limited to the precise construction andcomponents disclosed herein and that various modifications, changes andvariations which will be apparent to those skilled in the art may bemade in the arrangement, operation and details of the method andapparatus disclosed herein without departing from the spirit and scopeas defined in the appended claims.

I claim:
 1. A method for augmenting web pages with rich media content,the method comprising: analyzing, by a server, a web page to identify avisual element that is a qualified visual element; determining, by theserver, a keyword associated with the qualified visual element;generating, by the server, an association of the visual element and thekeyword; embedding, by the code, the association in an augmented webpage corresponding to the web page; receiving, by the server, a requestfrom a client computer corresponding to a pointer being positioned overthe visual element in the augmented web page; and responsive toreceiving the request, determining a piece of media content relevant tothe visual element by searching for the advertisement using the keyword,and transmitting the piece of rich media content to the client computerfor display.
 2. The method of claim 1, wherein the visual elementcomprises one of a still image, a video, or a Flash, and the piece ofmedia content comprises an advertisement.
 3. The method of claim 1,wherein analyzing the web page to identify the qualified visual elementcomprises analyzing the web page for visual elements with a length noless than a minimum length, a height no less than a minimum height. 4.The method of claim 1, wherein determining the keyword associated withthe qualified visual element comprises: identifying context keywords inthe web page, and identifying subject keywords in the web page byanalyzing tags associated with the qualified visual element, text nodesnear the qualified visual element, and areas of interest in the webpage.
 5. The method of claim 1, wherein embedding the association in theaugmented web page corresponding to the web page comprises embeddingcomputer code in the augmented web page for monitoring user pointermovement on a display of the web page.
 6. The method of claim 1, whereinthe piece of rich media content is displayable as an overlay in an areaproximate to the visual element simultaneous to the pointer beingpositioned over the visual element.
 7. The method of claim 1, furthercomprising: tracking the request or the piece of rich media content. 8.A non-transitory computer readable medium with stored instructions, theinstructions when executed by a processor cause the processor to performa method comprising: analyzing, by a server a web page to identify avisual element that is a qualified visual element; determining, by theserver, a keyword associated with the qualified visual element;generating, by the server, an association of the visual element and thekeyword; embedding, by the code, the association in an augmented webpage corresponding to the web page; receiving, by the server, a requestfrom a client computer corresponding to a pointer being positioned overthe visual element in the augmented web page; and responsive toreceiving the request, determining a piece of media content relevant tothe visual element by searching for the advertisement using the keyword,and transmitting the piece of rich media content to the client computerfor display.
 9. The computer readable medium of claim 8, wherein thevisual element comprises one of a still image, a video, or a Flash, andthe piece of media content comprises an advertisement.
 10. The computerreadable medium of claim 8, wherein analyzing the web page to identifythe qualified visual element comprises analyzing the web page for visualelements with a length no less than a minimum length, a height no lessthan a minimum height.
 11. The computer readable medium of claim 8,wherein determining the keyword associated with the qualified visualelement comprises: identifying context keywords in the web page, andidentifying subject keywords in the web page by analyzing tagsassociated with the qualified visual element, text nodes near thequalified visual element, and areas of interest in the web page.
 12. Thecomputer readable medium of claim 8, wherein embedding the associationin the augmented web page corresponding to the web page comprisesembedding computer code in the augmented web page for monitoring userpointer movement on a display of the web page.
 13. The computer readablemedium of claim 8, wherein the piece of rich media content isdisplayable as an overlay in an area proximate to the visual elementsimultaneous to the pointer being positioned over the visual element.14. The computer readable medium of claim 8, wherein the method furthercomprises: tracking the request or the piece of rich media content. 15.An augmentation system for augmenting web pages with rich media content,the system comprising: an input/output module for retrieving web pages,receiving requests, and transmitting augmented web pages and rich mediacontent; an image augmentation module for analyzing a web page toidentify a visual element that is a qualified visual element,determining a keyword associated with the qualified visual element,generating an association of the visual element and the keyword, andembedding the association in an augmented web page corresponding to theweb page; an advertisement deliver module for receiving a request from aclient computer corresponding to a pointer being positioned over thevisual element in the augmented web page, responsive to receiving thesignal, determining a piece rich media content relevant to the visualelement by searching for the advertisement using the keyword, andtransmitting the piece of media content to the client computer fordisplay.
 16. The augmentation system of claim 15, wherein the visualelement comprises one of a still image, a video, or a Flash, and thepiece of media content comprises an advertisement.
 17. The augmentationsystem of claim 15, wherein the image augmentation module is furtherconfigured for analyzing the web page for visual elements with a lengthno less than a minimum length, a height no less than a minimum height.18. The augmentation system of claim 15, wherein the image augmentationmodule is further configured for identifying context keywords in the webpage, and identifying subject keywords in the web page by analyzing tagsassociated with the qualified visual element, text nodes near thequalified visual element, and areas of interest in the web page.
 19. Theaugmentation system of claim 15, wherein the image augmentation moduleis further configured for embedding computer code in the augmented webpage for monitoring user pointer movement on a display of the web page.20. The augmentation system of claim 15, wherein the piece of rich mediacontent is displayable as an overlay in an area proximate to the visualelement simultaneous to the pointer being positioned over the visualelement.